Reconstruction techniques for improving the perceptual quality of binary masked speech.
نویسندگان
چکیده
This study proposes an approach to improve the perceptual quality of speech separated by binary masking through the use of reconstruction in the time-frequency domain. Non-negative matrix factorization and sparse reconstruction approaches are investigated, both using a linear combination of basis vectors to represent a signal. In this approach, the short-time Fourier transform (STFT) of separated speech is represented as a linear combination of STFTs from a clean speech dictionary. Binary masking for separation is performed using deep neural networks or Bayesian classifiers. The perceptual evaluation of speech quality, which is a standard objective speech quality measure, is used to evaluate the performance of the proposed approach. The results show that the proposed techniques improve the perceptual quality of binary masked speech, and outperform traditional time-frequency reconstruction approaches.
منابع مشابه
Speech Reconstruction from Binary Masked Spectrograms Using Vector Quantized Speaker Models
Several source separation techniques use binary masking on spectrograms to separate two or more speakers from each other. In this thesis, the possibilities for obtaining the best quality signal, reconstructed from masked spectrograms through vector quantized models of speakers, is investigated. The advantages and disadvantages of such an approach are examined. Additionally, the task of signal r...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملEstimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.
As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated f...
متن کاملPerformance analysis of various single channel speech enhancement algorithms for automatic speech recognition
This paper analyzes the performance of various single channel speech enhancement systems when they are applied to automatic speech recognition (ASR) systems as a preprocessor. Until now the researches on speech enhancement algorithms have focused on improving the perceptual quality of speech signal. However, it has not been verified yet whether the improvements of the perceptual quality also in...
متن کاملL2 Learners’ Lexical Inferencing: Perceptual Learning Style Preferences, Strategy Use, Density of Text, and Parts of Speech as Possible Predictors
This study was intended first to categorize the L2 learners in terms of their learning style preferences and second to investigate if their learning preferences are related to lexical inferencing. Moreover, strategies used for lexical inferencing and text related issues of text density and parts of speech were studied to determine their moderating effects and the best predictors of lexical infe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Journal of the Acoustical Society of America
دوره 136 2 شماره
صفحات -
تاریخ انتشار 2014